Morpheme-based Derivation of Bipolar Semantic Orientation of Chinese Words
نویسندگان
چکیده
The evaluative character of a word is called its semantic orientation (SO). A positive SO indicates desirability (e.g. Good, Honest) and a negative SO indicates undesirability (e.g., Bad, Ugly). This paper presents a method, based on Turney (2003), for inferring the SO of a word from its statistical association with strongly-polarized words and morphemes in Chinese. It is noted that morphemes are much less numerous than words, and that also a small number of fundamental morphemes may be used in the modified system to great advantage. The algorithm was tested on 1,249 words (604 positive and 645 negative) in a corpus of 34 million words, and was run with 20 and 40 polarized words respectively, giving a high precision (79.96% to 81.05%), but a low recall (45.56% to 59.57%). The algorithm was then run with 20 polarized morphemes, or single characters, in the same corpus, giving a high precision of 80.23% and a high recall of 85.03%. We concluded that morphemes in Chinese, as in any language, constitute a distinct sub-lexical unit which, though small in number, has greater linguistic significance than words, as seen by the significant enhancement of results with a much smaller corpus than that required by Turney.
منابع مشابه
A Morpheme-based Method to Chinese Sentence-Level Sentiment Classification
Sentiment classification is a fundamental task in opinion mining. However, most existing systems require a sentiment lexicon to guide sentiment classification, which inevitably suffer from the problem of unknown words. In this paper, we present a morpheme-based fine-to-coarse strategy for Chinese sentence-level sentiment classification. To approach this, we first employ morphological productivi...
متن کاملIdentifying Different Meanings of a Chinese Morpheme through Latent Semantic Analysis and Minimum Spanning Tree Analysis
A character corresponds roughly to a morpheme in Chinese, and it usually takes on multiple meanings. In this paper, we aimed at capturing the multiple meanings of a Chinese morpheme across polymorphemic words in a growing semantic micro-space. Using Latent Semantic Analysis (LSA), we created several nested LSA semantic micro-spaces of increasing size. The term-document matrix of the smallest se...
متن کاملUsing Kohonen Maps of Chinese Morphological Families to Visualize the Interplay of Morphology and Semantics in Chinese
A morphological family in Chinese is the set of compound words embedding a common morpheme. Self-organizing maps (SOM) of Chinese morphological families are built. Computation of the unified-distance matrices for the SOMs allows us to perform a semantic clustering of the members of the morphological families. Such a semantic clustering shed light on the interplay between morphology and semantic...
متن کاملFast automatic translation and morphological decomposition in Chinese-English bilinguals.
In this study, we investigated automatic translation from English to Chinese and subsequent morphological decomposition of translated Chinese compounds. In two lexical decision tasks, Chinese-English bilinguals responded to English target words that were preceded by masked unrelated primes presented for 59 ms. Unbeknownst to participants, the Chinese translations of the words in each critical p...
متن کاملMorpheme-Enhanced Spectral Word Embedding
Traditional word embedding models only learn word-level semantic information from corpus while neglect the valuable semantic information of words’ internal structures such as morphemes. To address this problem, the goal of this paper is to exploit the morphological information to enhance the quality of word embeddings. Based on spectral method, we propose two word embedding models: Morpheme on ...
متن کامل